Set Reconciliation and File Synchronization Using Invertible Bloom Lookup Tables
نویسندگان
چکیده
As more and more data migrate to the cloud, and the same files become accessible from multiple different machines, finding effective ways to ensure data consistency is becoming increasingly important. In this thesis, we cover current methods for efficiently maintaining sets of objects without the use of logs or other prior context, which is better known as the set reconciliation problem. We also discuss the state of the art for file synchronization, including methods that use set reconciliation techniques as an intermediate step. We explain the design and implementation of a novel file synchronization protocol tailored to minimize transmission complexity and targeted for files with relatively few changes. We also propose an extension of our file synchronization protocol for more general file directory synchronization. We describe IBLTsync1, our implementation of the aforementioned file synchronization protocol, and benchmark it against a naïve file transmission protocol and rsync, a popular file synchronization library. We find that for files with relatively few changes, IBLTsync transmits significantly less data than the naïve protocol, and moderately less data than rsync. In addition, we provide the first (to our knowledge) implementation of multi-party set reconciliation using Invertible Bloom Lookup Tables, a hash based data structure, and evaluate its performance for message propagation in large networks. 1all relevant code available at https://github.com/mgentili/SetReconciliation
منابع مشابه
On the database lookup problem of approximate matching
Investigating seized devices within digital forensics gets more and more difficult due to the increasing amount of data. Hence, a common procedure uses automated file identification which reduces the amount of data an investigator has to look at by hand. Besides identifying exact duplicates, which is mostly solved using cryptographic hash functions, it is also helpful to detect similar data by ...
متن کاملAlgorithms for Synchronizing Command and Control Data in Disconnected, Intermittent and Low-Bandwidth Environments
In this work, we study the problem of reconciling similar sets of Command and Control (C2) data within a distributed, intermittent, and low-bandwidth (DIL) environment.1 We begin by developing a simple mathematical model that characterizes C2 data synchronization. Using this model, we produce lower bounds on the minimum throughput required between two hosts in order to synchronize their data. A...
متن کاملFast and deterministic hash table lookup using discriminative bloom filters
Hash tables are widely used in network applications, as they can achieve O(1) query, insert, and delete operations at moderate loads. However, at high loads, collisions are prevalent in the table, which increases the access time and induces non-deterministic performance. Slow rates and non-determinism can considerably hurt the performance and scalability of hash tables in the multi-threaded par...
متن کاملAccelerating Boolean Matching Using Bloom Filter
Boolean matching is a fundamental problem in FPGA synthesis, but existing Boolean matchers are not scalable to complex PLBs (programmable logic blocks) and large circuits. This paper proposes a filter-based Boolean matching method, F-BM, which accelerates Boolean matching using lookup tables implemented by Bloom filters storing precalculated matching results. To show the effectiveness of the pr...
متن کاملSet Reconciliation in Two Rounds of Communication
In this work, we propose an approach, known as the C2SS-BF method, to synchronizing similar sets of data that uses an Invertible Bloom Filter (IBF). The C2SS-BF method builds on previous work by Eppstein et al. in [6]. By allowing two rounds of communication, we show that in many cases the proposed approach requires substantially less throughput than the algorithm proposed in [6]. The C2SS-BF c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015